Search CORE

71 research outputs found

Putting Context into Schema Matching

Author: Bohannon Philip
Elnahrawy Eiman
Fan Wenfei
Flaster Michael
Publication venue
Publication date: 01/01/2006
Field of study

LegoDB: customizing relational storage for XML documents

Author: Bohannon Philip
Freire Juliana
Publication venue: Very Large Data Base Endowment Inc. (VLDB)
Publication date: 01/01/2002
Field of study

Journal ArticleXML is becoming the predominant data exchange format in a variety of application domains (supply-chain, scientific data processing, telecommunication infrastructure, etc.). Not only is an increasing amount of XML data now being processed, but XML is also increasingly being used in business-critical applications. Efficient and reliable storage is an important requirement for these applications. By relying on relational engines for this purpose, XML developers can benefit from a complete set of data management services (including concurrency control, crash recovery, and scalability) and from the highly optimized relational query processors

The University of Utah: J. Willard Marriott Digital Library

Bridging the XML-relational divide with LegoDB: a demonstration

Author: Bohannon Philip
Freire Juliana
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2003
Field of study

Journal ArticleWe present LegoDB, a cost-based XML storage mapping engine that automatically explores a space of possible XML-to-relational mappings and selects an efficient mapping for a given application

The University of Utah: J. Willard Marriott Digital Library

From XML schema to relations: a cost-based approach to XML storage

Author: Bohannon Philip
Freire Juliana
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2002
Field of study

Journal ArticleAs Web applications manipulate an increasing amount of XML, there is a growing interest in storing XML data in relational databases. Due to the mismatch between the complexity of XML's tree structure and the simplicity of flat relational tables, there are many ways to store the same document in an RDBMS, and a number of heuristic techniques have been proposed. These techniques typically define fixed mappings and do not take application characteristics into account. However, a fixed mapping is unlikely to work well for all possible applications. In contrast, LegoDB is a cost-based XML storage mapping engine that explores a space of possible XML-to-relational mappings and selects the best mapping for a given application. LegoDB leverages current XML and relational technologies: 1) it models the target application with an XML Schema, XML data statistics, and an XQuery workload; 2) the space of configurations is generated through XML-Schema rewritings; and 3) the best among the derived configurations is selected using cost estimates obtained through a standard relational optimizer. In this paper, we describe the LegoDB storage engine and provide experimental results that demonstrate the effectiveness of this approach

The University of Utah: J. Willard Marriott Digital Library

Querying xml with update syntax

Author: Bohannon Philip
Cong Gao
Fan Wenfei
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2007
Field of study

db This paper investigates a class of transform queries proposed by XQuery Update [6]. A transform query is defined in terms of XML update syntax. When posed on an XML tree T, it returns another XML tree that would be produced by executing its embedded update on T, without destructive impact on T. Transform queries support a variety of applications including XML hypothetical queries, the simulation of updates on virtual views, and the enforcement of XML access control. In light of the wide-range of applications for transform queries, we develop automaton-based techniques for efficiently evaluating transform queries and for computing their compositions with user queries in standard XQuery. We provide (a) three algorithms to implement transform queries without change to existing XQuery processors, (b) a linear-time algorithm, based on a seamless integration of automaton execution and SAX parsing, to evaluate transform queries on large XML documents that are difficult to handle by existing XQuery engines, and (c) an algorithm to rewrite the composition of user queries and transform queries into a single efficient query in standard XQuery. We also present experimental results comparing the efficiency of our evaluation and composition algorithms for transform queries

CiteSeerX

Edinburgh Research Explorer

Incremental Evaluation of Schema-directed XML Publishing

Author: Bohannon Philip
Choi Byron
Fan Wenfei
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2004
Field of study

Edinburgh Research Explorer

Conditional Functional Dependencies for Data Cleaning

Author: Bohannon Philip
Fan Wenfei
Geerts Floris
Jia Xibei
Kementsietsidis Anastasios
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

We propose a class of constraints, referred to as conditional functional dependencies (CFDs), and study their applications in data cleaning. In contrast to traditional functional dependencies (FDs) that were developed mainly for schema design, CFDs aim at capturing the consistency of data by incorporating bindings of semantically related values. For CFDs we provide an inference system analogous to Armstrong’s axioms for FDs, as well as consistency analysis. Since CFDs allow data bindings, a large number of individual constraints may hold on a table, complicating detection of constraint violations. We develop techniques for detecting CFD violations in SQL as well as novel techniques for checking multiple constraints in a single query. We experimentally evaluate the performance of our CFD-based methods for inconsistency detection. This not only yields a constraint theory for CFDs butisalsoasteptowardapractical constraint-based method for improving data quality.

CiteSeerX

Crossref

Edinburgh Research Explorer